Turkish LVCSR: towards better speech recognition for agglutinative languages

نویسندگان

Kenan Çarki

Petra Geutner

Tanja Schultz

چکیده

The Turkish language belongs to the Turkic family. All members of this family are close to one another in terms of linguistic structure. Typological similarities are vowel harmony, verb-final word order and agglutinative morphology. This latter property causes a very fast vocabulary growth resulting in a large number of out-of-vocabulary words. In this paper we describe our first experiments in a speaker independent LVCSR engine for Modern Standard Turkish. First results on our Turkish speech recognition system are presented. The currently best system shows very promising results achieving 16.9% word error rate. To overcome the OOV-problem we propose a morphem-based and the Hypothesis Driven Lexical Adaptation approach. The final Turkish system is integrated into the multilingual recognition engine of the GlobalPhone project.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Turkish LVCSR: Database Preparation and Language Modeling for an Agglutinative Language

Turkish language is an agglutinative language. It is possible to produce a very high number of words from the same root with suffixes [1]. Language modeling for agglutinative languages needs to be different than modeling of languages like English. Such languages also have inflections but not as many as an agglutinative language. Techniques which can be used for modeling agglutinative languages ...

متن کامل

Coalescence Type based Confidence Warping for Agglutinative Language Keyword Spotting

In agglutinative languages like Korean, words are formed by joining l affix morphemes to the stem, which leads to high OOV rate in dictionary building. Hence, subword units are usually used as basic language modeling units in Large-Vocabulary Continuous Speech Recognition (LVCSR) or LVCSR based applications such as keyword spotting. In this work, firstly a new word property called coalescence t...

متن کامل

On lexicon creation for turkish LVCSR

In this paper, we address the lexicon design problem in Turkish large vocabulary speech recognition. Although we focus only on Turkish, the methods described here are general enough that they can be considered for other agglutinative languages like Finnish, Korean etc. In an agglutinative language, several words can be created from a single root word using a rich collection of morphological rul...

متن کامل

On morph-based LVCSR improvements

Efficient large vocabulary continuous speech recognition of morphologically rich languages is a big challenge due to the rapid vocabulary growth. To improve the results various subword units called as morphs are applied as basic language elements. The improvements over the word baseline, however, are changing from negative to error rate halving across languages and tasks. In this paper we make ...

متن کامل

The GlobalPhone Project: Multilingual LVCSR with JANUS-3

This paper describes our recent e ort in developing the GlobalPhone database for multilingual large vocabulary continuous speech recognition. In particular we present the current status of the GlobalPhone corpus containing high quality speech data for the 9 languages Arabic, Chinese, Croatic, Japanese, Korean, Portuguese, Russian, Spanish, and Turkish. We also discuss the JANUS-3 toolkit and ho...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

Turkish LVCSR: towards better speech recognition for agglutinative languages

نویسندگان

چکیده

منابع مشابه

Turkish LVCSR: Database Preparation and Language Modeling for an Agglutinative Language

Coalescence Type based Confidence Warping for Agglutinative Language Keyword Spotting

On lexicon creation for turkish LVCSR

On morph-based LVCSR improvements

The GlobalPhone Project: Multilingual LVCSR with JANUS-3

عنوان ژورنال:

اشتراک گذاری